Lip-Reading Driven Deep Learning Approach for Speech Enhancement
نویسندگان
چکیده
This paper proposes a novel lip-reading driven deep learning framework for speech enhancement. The approach leverages the complementary strengths of both and analytical acoustic modeling (filtering-based approach) as compared to benchmark approaches that rely only on learning. proposed audio-visual (AV) enhancement operates at two levels. In first level, based regression model is employed. second approximated clean-audio features are exploited, using an enhanced, visually-derived Wiener filter (EVWF), estimating clean audio power spectrum. Specifically, stacked long-short-term memory (LSTM) designed temporal visual (i.e., lip reading), by considering range prior frames. For spectrum estimation, new filterbank-domain EVWF formulated, which exploits estimated features. with conventional spectral subtraction log-minimum mean-square error methods ideal AV mapping LSTM approaches. potential evaluated under four different dynamic real-world scenarios [cafe, street junction, public transport, pedestrian area] SNR levels (ranging from low high SNRs) grid ChiME3 corpora. objective testing, perceptual evaluation quality used evaluate restored speech. subjective standard mean-opinion-score method inferential statistics. Comparative simulation results demonstrate significant improvements in terms intelligibility. Ongoing work aimed enhancing accuracy generalization capability model, contextual integration cues, leading context-aware, autonomous
منابع مشابه
Learning Visual Models for Lip Reading
This chapter describes learning techniques that are the basis of a "visual speech recognition" or "lipreading" system 1 • Model-based vision systems currently have the best performance for many visual recognition tasks. For geometrically simple domains, models can sometimes be constructed by hand using CAD-like tools. Such models are difficult and expensive to construct, however, and are inadeq...
متن کاملDeep Neural Network Approach for Single Channel Speech Enhancement Processing
..................................................................................................................................... ii Acknowledgements .................................................................................................................. iii Table of contents .............................................................................................................
متن کاملDeep Learning for Lip Reading using Audio-Visual Information for Urdu Language
Human lip-reading is a challenging task. It requires not only knowledge of underlying language but also visual clues to predict spoken words. Experts need certain level of experience and understanding of visual expressions learning to decode spoken words. Now-a-days, with the help of deep learning it is possible to translate lip sequences into meaningful words. The speech recognition in the noi...
متن کاملA Deep Learning Approach to Data-driven Parameterizations for Statistical Parametric Speech Synthesis
Nearly all Statistical Parametric Speech Synthesizers today use Mel Cepstral coefficients as the vocal tract parameterization of the speech signal. Mel Cepstral coefficients were never intended to work in a parametric speech synthesis framework, but as yet, there has been little success in creating a better parameterization that is more suited to synthesis. In this paper, we use deep learning a...
متن کاملLip Tracking Towards an Automatic Lip Reading Approach
Current era is to make the interaction between humans and their artificial partners (Computers) and make communication easier and more reliable. One of the actual tasks is the use of vocal interaction. Speech recognition may be improved by visual information of human face. In literature, the lip shape and its movement are referred to as lip reading. Lip reading computing plays a vital role in a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE transactions on emerging topics in computational intelligence
سال: 2021
ISSN: ['2471-285X']
DOI: https://doi.org/10.1109/tetci.2019.2917039